<html>
<body style="width:800px;">
<h2>
    Tentative title: Profiling adventures and cython - Introducing cython.
</h2>

<p>
    In the previous blog post, I made some attempts at speeding up the
    function <code>mandel()</code> by making changes in the Python code.
    While I had some success in doing so, it was clearly not enough
    for my purpose.  As a result, I will now try to use cython.  Before
    I do this, I note again the result from the last profiling run,
    limiting the information to the 5 longest-running functions or methods.
</p>

<pre>
       3673150 function calls in 84.807 CPU seconds

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
3168000   73.210    0.000   73.210    0.000 mandel1c.py:7(mandel)
     11   10.855    0.987   84.204    7.655 viewer1.py:23(draw_fractal)
      1    0.593    0.593    0.593    0.593 {_tkinter.create}
 504530    0.137    0.000    0.137    0.000 viewer1.py:17(draw_pixel)
     37    0.009    0.000    0.009    0.000 {built-in method call}
</pre>

<p>
    The goal of cython could be described as providing an easy way to convert
    a Python module into a C extension.  This is what I will
    do.  [There are other ways to work with cython extensions than
    what I use here; for more information, please
    consult the <a href="http://cython.org/">cython web site</a>.]
    <b>Note that I am NOT a cython expert; this is only the first project
    for which I use cython.</b>  While I am not interested in creating an
    application for distribution, and hence do not use the setup method
    for cython, it is quite possible that there are better
    ways to use cython than what I explore here.
</p>

<p>
    I first start by taking my existing module and copying it into a new
    file, with a ".pyx" extension instead of the traditional ".py".
</p>

<pre>
# mandel2cy.pyx
# cython: profile=True

def mandel(c, max_iterations=20):
    '''determines if a point is in the Mandelbrot set based on deciding if,
       after a maximum allowed number of iterations, the absolute value of
       the resulting number is greater or equal to 2.'''
    z = 0
    for i in range(0, max_iterations):
        z = z**2 + c
        if abs(z) >= 4:
            return False
    return abs(z) < 2
</pre>

<p>
    Note that I have removed the equivalence between
    <code>range</code> and <code>xrange</code>.  The reason I have done this
    is because with <code>xrange</code> present like this in the file
    results in a compilation
    error when running cython with Python 3.1.  Furthermore, as will
    be seen later, it is not really needed even for Python 2.x when using
    cython properly.
</p>

<p>
    I have also included a commented line stating that 'profile' was equal
    to True; this is a cython directive that will enable the Python profiler
    to also include cython functions in its tally.
</p>

<p>
    In order to import this module, I also need to modify the viewer to
    import the cython module. Here is the new version.
</p>

<pre>
# viewer2.py

import pyximport
pyximport.install()

from mandel2_cy import mandel
from viewer import Viewer
import time

import sys
if sys.version_info < (3,):
    import Tkinter as tk
    range = xrange
else:
    import tkinter as tk

class FancyViewer(Viewer):
    '''Application to display fractals'''

    def draw_pixel(self, x, y):
        '''Simulates drawing a given pixel in black by drawing a black line
           of length equal to one pixel.'''
        return
        #self.canvas.create_line(x, y, x+1, y, fill="black")

    def draw_fractal(self):
        '''draws a fractal on the canvas'''
        self.calculating = True
        begin = time.time()
        # clear the canvas
        self.canvas.create_rectangle(0, 0, self.canvas_width,
                                    self.canvas_height, fill="white")
        for x in range(0, self.canvas_width):
            real = self.min_x + x*self.pixel_size
            for y in range(0, self.canvas_height):
                imag = self.min_y + y*self.pixel_size
                c = complex(real, imag)
                if mandel(c, self.nb_iterations):
                    self.draw_pixel(x, self.canvas_height - y)
        self.status.config(text="Time required = %.2f s  [%s iterations]  %s" %(
                                (time.time() - begin), self.nb_iterations,
                                                                self.zoom_info))
        self.status2.config(text=self.info())
        self.calculating = False

if __name__ == "__main__":
    root = tk.Tk()
    app = FancyViewer(root)
    root.mainloop()
</pre>

<p>
    Other than the top few lines, nothing has changed.  Time to run
    the profiler with this new version.
</p>

<pre>
       6841793 function calls in 50.145 CPU seconds

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
3168000   35.913    0.000   35.913    0.000 mandel2_cy.pyx:4(mandel)
     11   10.670    0.970   48.754    4.432 viewer2.py:26(draw_fractal)
3168000    2.001    0.000   37.914    0.000 {mandel2_cy.mandel}
      1    1.356    1.356    1.356    1.356 {_tkinter.create}
 505173    0.167    0.000    0.167    0.000 viewer2.py:20(draw_pixel)
</pre>


<p>
    A reduction from 84 to 50 seconds; cython must be doing something
    right!  Note that the calls to <code>abs()</code> have been eliminated
    by using cython.  All I did is import the module via Python without
    making any other change to the code.
</p>

<p>
    Note also that <code>mandel</code> appears twice: once (the longest running) as
    the function defined on line 8 of mandel2_cy.pyx, and once as
    a object belonging to the module mandel2_cy.  I will come back to this
    later but, for now, I will do some changes to help cython do even better.
</p>

<p>
    As mentioned before, cython is a tool to help create C extensions.
    One of the differences between C and Python is that variables
    have a declared type in C.  If one tells cython about what type a given
    variable is, cython can often use that information to make the code run
    faster.  As an example, I know that two of the variables are of type
    integers which is a native C type; I can
    <a href="http://docs.cython.org/src/quickstart/cythonize.html">add this information</a>
    as follows.
</p>

<pre>
# mandel2a_cy.pyx
# cython: profile=True

def mandel(c, int max_iterations=20):
    '''determines if a point is in the Mandelbrot set based on deciding if,
       after a maximum allowed number of iterations, the absolute value of
       the resulting number is greater or equal to 2.'''
    cdef int i
    z = 0
    for i in range(0, max_iterations):
        z = z**2 + c
        if abs(z) >= 2:
            return False
    return abs(z) < 2
</pre>


<p>
    Running the profiler with this change yields the following:
</p>

<pre>
       6841793 function calls in 39.860 CPU seconds

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
3168000   27.431    0.000   27.431    0.000 mandel2a_cy.pyx:4(mandel)
     11    9.869    0.897   39.339    3.576 viewer2.py:26(draw_fractal)
3168000    1.906    0.000   29.337    0.000 {mandel2a_cy.mandel}
      1    0.511    0.511    0.511    0.511 {_tkinter.create}
 505173    0.131    0.000    0.131    0.000 viewer2.py:20(draw_pixel)
</pre>

<p>
    Another significant time reduction, this time of the order of 20%.
    And we didn't tell cython that "z" and "c" are complex yet.
</p>

<p>
    Actually, C does not have a complex data type.  So, I can choose one
    of two strategies:
</p>

<ol>
    <li>
        I can change the code so that I deal only with real numbers,
        by working myself how to multiply and add complex numbers.
    </li>
    <li>
        I can use some
        <a href="http://docs.cython.org/src/userguide/extension_types.html#external-extension-types">
        special cython technique</a> to extract all the
        relevant information about the Python built-in complex data type
        without changing the code inside the function (other than
        adding some type declaration).
    </li>
</ol>

<p>
    I will choose the second of these methods and see what it gives. The
    required changes are as follows:
</p>

<pre>
# mandel2b_cy.pyx
# cython: profile=True

cdef extern from "complexobject.h":

    struct Py_complex:
        double real
        double imag

    ctypedef class __builtin__.complex [object PyComplexObject]:
        cdef Py_complex cval


def mandel(complex c, int max_iterations=20):
    '''determines if a point is in the Mandelbrot set based on deciding if,
       after a maximum allowed number of iterations, the absolute value of
       the resulting number is greater or equal to 2.'''
    cdef int i
    cdef complex z

    z = 0. + 0.j

    for i in range(0, max_iterations):
        z = z**2 + c
        if abs(z) >= 2:
            return False
    return abs(z) < 2
</pre>

<p>
    The timing results are the following:
</p>

<pre>
       6841793 function calls in 38.424 CPU seconds

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
3168000   26.771    0.000   26.771    0.000 mandel2b_cy.pyx:14(mandel)
     11    9.435    0.858   38.209    3.474 viewer2.py:26(draw_fractal)
3168000    1.865    0.000   28.636    0.000 {mandel2b_cy.mandel}
      1    0.205    0.205    0.205    0.205 {_tkinter.create}
 505173    0.136    0.000    0.136    0.000 viewer2.py:20(draw_pixel)
</pre>

<p>
    The time difference between this run and the previous one is within
    the variation I observe from one profiling run to the next (using exactly
    the same program).  Therefore, I conclude that this latest attempt
    didn't speed up the code.  It is possible that I have overlooked something
    to ensure that cython could make use of the information about the
    complex datatype more efficiently ...
    It seems like I need a different strategy. I will resort
    to doing the complex algebra myself, and work only with real numbers.
    Here's the modified code for the mandel module.
</p>

<pre>
# mandel2c_cy.pyx
# cython: profile=True

def mandel(double real, double imag, int max_iterations=20):
    '''determines if a point is in the Mandelbrot set based on deciding if,
       after a maximum allowed number of iterations, the absolute value of
       the resulting number is greater or equal to 2.'''
    cdef double z_real = 0., z_imag = 0.
    cdef int i

    for i in range(0, max_iterations):
        z_real, z_imag = ( z_real*z_real - z_imag*z_imag + real,
                     2*z_real*z_imag + imag )
        if (z_real*z_real + z_imag*z_imag) >= 4:
            return False
    return (z_real*z_real + z_imag*z_imag) < 4
</pre>

<p>
    I also change the call within <code>draw_fractal()</code> so that
    I don't use complex variables.  The result is extremely encouraging:
</p>

<pre>
       6841793 function calls in 7.205 CPU seconds

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     11    4.379    0.398    7.066    0.642 viewer2a.py:26(draw_fractal)
3168000    1.557    0.000    2.570    0.000 {mandel2c_cy.mandel}
3168000    1.013    0.000    1.013    0.000 mandel2c_cy.pyx:4(mandel)
      1    0.130    0.130    0.130    0.130 {_tkinter.create}
 505173    0.114    0.000    0.114    0.000 viewer2a.py:20(draw_pixel)
</pre>

<p>
    This total execution time has been reduced from 38 to 7 seconds.
    <code>mandel()</code> is no longer the largest contributor to the
    overall execution time; <code>draw_fractal()</code> is.  However,
    the program is still a bit too slow: without actually doing any drawing,
    it takes approximately 0.6 seconds to generate one fractal image.
    However, I can do better.  Looking at the code, I notice that
    <code>draw_fractal()</code> contains two embedded for loops, resulting to
    all those calls to
    <code>mandel()</code>.  Remember how telling cython about integer types
    used in loops sped up the code?   This suggest that perhaps I should
    do something similar and move some
    of the code of <code>draw_fractal()</code> to the cython module.
    Here's a modified viewer module.
</p>

<pre>
# viewer2b.py

import pyximport
pyximport.install()

from mandel2d_cy import create_fractal
from viewer import Viewer
import time

import sys
if sys.version_info < (3,):
    import Tkinter as tk
    range = xrange
else:
    import tkinter as tk

class FancyViewer(Viewer):
    '''Application to display fractals'''

    def draw_fractal(self):
        '''draws a fractal on the canvas'''
        self.calculating = True
        begin = time.time()
        # clear the canvas
        self.canvas.create_rectangle(0, 0, self.canvas_width,
                                    self.canvas_height, fill="white")
        create_fractal(self.canvas_width, self.canvas_height,
                       self.min_x, self.min_y, self.pixel_size,
                       self.nb_iterations, self.canvas)
        self.status.config(text="Time required = %.2f s  [%s iterations]  %s" %(
                                (time.time() - begin), self.nb_iterations,
                                                                self.zoom_info))
        self.status2.config(text=self.info())
        self.calculating = False

if __name__ == "__main__":
    root = tk.Tk()
    app = FancyViewer(root)
    root.mainloop()
</pre>

<p>
    And here is the new cython module, without any additional type declaration.
</p>

<pre>
# mandel2d_cy.pyx
# cython: profile=True

def mandel(double real, double imag, int max_iterations=20):
    '''determines if a point is in the Mandelbrot set based on deciding if,
       after a maximum allowed number of iterations, the absolute value of
       the resulting number is greater or equal to 2.'''
    cdef double z_real = 0., z_imag = 0.
    cdef int i

    for i in range(0, max_iterations):
        z_real, z_imag = ( z_real*z_real - z_imag*z_imag + real,
                     2*z_real*z_imag + imag )
        if (z_real*z_real + z_imag*z_imag) >= 4:
            return False
    return (z_real*z_real + z_imag*z_imag) < 4

def draw_pixel(x, y, canvas):
    '''Simulates drawing a given pixel in black by drawing a black line
       of length equal to one pixel.'''
    return
    #canvas.create_line(x, y, x+1, y, fill="black")

def create_fractal(canvas_width, canvas_height,
                       min_x, min_y, pixel_size,
                       nb_iterations, canvas):
    for x in range(0, canvas_width):
        real = min_x + x*pixel_size
        for y in range(0, canvas_height):
            imag = min_y + y*pixel_size
            if mandel(real, imag, nb_iterations):
                draw_pixel(x, canvas_height - y, canvas)
</pre>

<p>
    The profiling result is as follows:
</p>

<pre>
       3673815 function calls in 3.873 CPU seconds

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     11    2.632    0.239    3.706    0.337 mandel2d_cy.pyx:24(create_fractal)
3168000    1.002    0.000    1.002    0.000 mandel2d_cy.pyx:4(mandel)
      1    0.155    0.155    0.155    0.155 {_tkinter.create}
 505173    0.072    0.000    0.072    0.000 mandel2d_cy.pyx:18(draw_pixel)
     37    0.009    0.000    0.009    0.000 {built-in method call}
</pre>

<p>
    Simply by moving over some of the code to the cython module, I have reduced
    the profiling time to almost half of it previous value.  Looking more
    closely at the profiling results, I also notice that calls to
    <code>mandel()</code> now only appear once; some overhead in calling
    cython functions from python modules has disappeared. Let's see what happens
    if I now add some type information.
</p>

<pre>
def create_fractal(int canvas_width, int canvas_height,
                       double min_x, double min_y, double pixel_size,
                       int nb_iterations, canvas):
    cdef int x, y
    cdef double real, imag

    for x in range(0, canvas_width):
        real = min_x + x*pixel_size
        for y in range(0, canvas_height):
            imag = min_y + y*pixel_size
            if mandel(real, imag, nb_iterations):
                draw_pixel(x, canvas_height - y, canvas)
</pre>

<p>
    The result is only slightly better:
</p>

<pre>
       3673815 function calls in 3.475 CPU seconds

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     11    2.189    0.199    3.308    0.301 mandel2e_cy.pyx:24(create_fractal)
3168000    1.046    0.000    1.046    0.000 mandel2e_cy.pyx:4(mandel)
      1    0.135    0.135    0.135    0.135 {_tkinter.create}
 505173    0.074    0.000    0.074    0.000 mandel2e_cy.pyx:18(draw_pixel)
     37    0.028    0.001    0.028    0.001 {built-in method call}
</pre>


<p>
    However, one thing I remember from the little I know about C it that,
    not only do variables have
    to be declared to be of a certain type, but the same has to be done to
    functions as well.  Here, <code>mandel()</code> has not been declared
    to be of a specific type, so cython assumes it to be a generic
    Python object.  After reading the cython documentation, and noticing
    that <code>mandel()</code> is only called from within the cython
    module, I conclude that not only should I specify the type for
    <code>mandel()</code> but that it probably makes sense to
    specify that it can be "inlined"; I also
    do the same for <code>draw_pixel()</code>.
</p>

<pre>
# mandel2f_cy.pyx
# cython: profile=True

cdef inline bint mandel(double real, double imag, int max_iterations=20):
    '''determines if a point is in the Mandelbrot set based on deciding if,
       after a maximum allowed number of iterations, the absolute value of
       the resulting number is greater or equal to 2.'''
    cdef double z_real = 0., z_imag = 0.
    cdef int i

    for i in range(0, max_iterations):
        z_real, z_imag = ( z_real*z_real - z_imag*z_imag + real,
                     2*z_real*z_imag + imag )
        if (z_real*z_real + z_imag*z_imag) >= 4:
            return False
    return (z_real*z_real + z_imag*z_imag) < 4

cdef inline void draw_pixel(x, y, canvas):
    '''Simulates drawing a given pixel in black by drawing a black line
       of length equal to one pixel.'''
    return
    #canvas.create_line(x, y, x+1, y, fill="black")
</pre>

<p>
    This yields a nice improvement.
</p>

<pre>
       3673815 function calls in 2.333 CPU seconds

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     11    1.190    0.108    2.194    0.199 mandel2f_cy.pyx:24(create_fractal)
3168000    0.930    0.000    0.930    0.000 mandel2f_cy.pyx:4(mandel)
      1    0.127    0.127    0.127    0.127 {_tkinter.create}
 505173    0.074    0.000    0.074    0.000 mandel2f_cy.pyx:18(draw_pixel)
     37    0.009    0.000    0.009    0.000 {built-in method call}
</pre>

<p>
    However... I asked cython to "inline" <code>mandel</code>,
    thus treating them as a pure
    C function.  Yet, they both appear in the Python profiling information, which
    was not the case for <code>abs()</code> once I used cython for the
    first time.  The reason it appears is that cython has been instructed
    to profile all functions in the module, via the directive at the top.
    I can selectively turn off the profiling for an individual function
    by importing the "cython module" and using a special purpose
    decorator as follows.
</p>

<pre>
# mandel2g_cy.pyx
# cython: profile=True

import cython

@cython.profile(False)
cdef inline bint mandel(double real, double imag, int max_iterations=20):
    '''determines if a point is in the Mandelbrot set based on deciding if,
       after a maximum allowed number of iterations, the absolute value of
       the resulting number is greater or equal to 2.'''
    cdef double z_real = 0., z_imag = 0.
    cdef int i

    for i in range(0, max_iterations):
        z_real, z_imag = ( z_real*z_real - z_imag*z_imag + real,
                     2*z_real*z_imag + imag )
        if (z_real*z_real + z_imag*z_imag) >= 4:
            return False
    return (z_real*z_real + z_imag*z_imag) < 4

cdef inline void draw_pixel(x, y, canvas):
    '''Simulates drawing a given pixel in black by drawing a black line
       of length equal to one pixel.'''
    return
    #canvas.create_line(x, y, x+1, y, fill="black")
</pre>

<p>
    The result is even better than I would have expected!
</p>

<pre>
      505519 function calls in 0.817 CPU seconds

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    11    0.605    0.055    0.676    0.061 mandel2g_cy.pyx:27(create_fractal)
     1    0.128    0.128    0.128    0.128 {_tkinter.create}
504877    0.070    0.000    0.070    0.000 mandel2g_cy.pyx:21(draw_pixel)
    37    0.010    0.000    0.010    0.000 {built-in method call}
    11    0.001    0.000    0.678    0.062 viewer2b.py:20(draw_fractal)
</pre>

<p>
    However, increasing the number of iterations to 1000 (from the current
    value of 100 used for testing) does increase the time significantly.
</p>

<pre>
      495235 function calls in 3.872 CPU seconds

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    11    3.653    0.332    3.723    0.338 mandel2g_cy.pyx:27(create_fractal)
     1    0.136    0.136    0.136    0.136 {_tkinter.create}
494593    0.071    0.000    0.071    0.000 mandel2g_cy.pyx:21(draw_pixel)
    37    0.009    0.000    0.009    0.000 {built-in method call}
    11    0.001    0.000    3.726    0.339 viewer2b.py:20(draw_fractal)
</pre>


<p>
   It is probably a good time to put back the drawing to see what the
   overall time profile looks like in a more realistic situation.
</p>

<pre>
      5441165 function calls in 20.747 CPU seconds

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
494604    8.682    0.000   14.427    0.000 Tkinter.py:2135(_create)
    11    3.863    0.351   20.572    1.870 mandel2g_cy.pyx:27(create_fractal)
494632    2.043    0.000    5.326    0.000 Tkinter.py:1046(_options)
494657    1.845    0.000    2.861    0.000 Tkinter.py:77(_cnfmerge)
494593    1.548    0.000   16.709    0.000 mandel2g_cy.pyx:21(draw_pixel)
</pre>

<p>
    Clearly, the limiting time factor is now the Tkinter based drawing, and not
    the other code.  It is time to think of a better drawing strategy.
    However, this will have to wait until next post.
</p>

</body>
</html>
